Introduction to R

Liam D. Bailey & Alexandre Courtiol

Installing R and R Studio

What is R?

What is RStudio?

What is RStudio?

R packages

Installing and loading R packages


install.packages("tidyverse")


library(tidyverse)

Packages for this course

Note

INSTALL THESE ON YOUR SYSTEM NOW!

install.packages("adegenet")
install.packages("pegas")
install.packages("poppr")
install.packages("hierfstat")
install.packages("lattice")

Updating R packages

packageVersion("ggplot2")
[1] '3.4.4'

Good practice

Good practice

Set RStudio options

Good practice

Set RStudio options

Good practice

Save your script in a file

We’ll talk about RMarkdown later

Good practice

Use comments to understand your code better

# EXPLAIN WHAT THE CODE DOES
my_code_here

Basics of R: Functions

Functions


(Generally) take an input and return an output in R. For example, the function sum() takes a numeric vector and will return a single value.

## Compute the sum of some numbers
sum(c(1, 3, 5, 1))
[1] 10

Note

We’ll discuss more what a numeric vector is soon.

Functions


Trouble-shoot using ? or help()

## Help documentation of the function sum()
help(sum)
?sum

You can learn what package the function is from, what the function does and what arguments it takes.

Functions

Functions

Warning

Different packages might have functions with the same name!

package::function() is explicit about which package to use.

## In case of ambiguity, use :: and specify the package
base::sum(c(1, 3, 5, 1))
[1] 10

Functions


If a function doesn’t work it will display an error…

sum(-)
Error: <text>:1:6: unexpected ')'
1: sum(-)
         ^

…but these errors aren’t always easy to read.

Basics of R: Object types

Object types


Basic kinds of R objects (or ‘classes’)

## Character string
"word"
[1] "word"
## Numeric
1.2
[1] 1.2
## Logical (TRUE/FALSE)
TRUE
[1] TRUE
## Factor
factor(c("A", "B"), levels = c("A", "B"))
[1] A B
Levels: A B

Object types


Use str() if you’re unsure!

str("word")
 chr "word"
str(1.2)
 num 1.2
str(TRUE)
 logi TRUE
str(factor(c("A", "B")))
 Factor w/ 2 levels "A","B": 1 2

Object types


Assign an object with <- or ->

## Save value to use later!
my_object <- "A"

## Check what object we just created!
str(my_object)
 chr "A"

Vector

Vector


\(\geq\) 1 values of the same type

## Create a vector of numeric
my_vector <- c(1.2, 3.4, 0.1)

## Check the structure...
str(my_vector)
 num [1:3] 1.2 3.4 0.1

Vector


\(\geq\) 1 values of the same type

## A vector *coerces* everything to be the same
my_vector <- c(115.3, -0.1, "2")

## Notice everything is character!
str(my_vector)
 chr [1:3] "115.3" "-0.1" "2"

Vector


Vectors have 1 dimension (a length)

## Find how many values are in your vector!
length(my_vector)
[1] 3

Vector


Select particular values using ‘indexing’ with []

## 'Index' a vector using []
my_vector <- c("A", 115.3, -0.1)

## Find the first value in the vector
my_vector[1]
[1] "A"
## Find the first and third value in the vector
my_vector[c(1, 3)]
[1] "A"    "-0.1"
## Find everything *except* the second value in the vector
my_vector[-2]
[1] "A"    "-0.1"

Vector


Watch out for missing data.

# This numeric vector has some unusual values
missing_data <- c(NULL, 1.1, 0.2, NA, 7, NaN, Inf)

# NULL: Empty
# NA: Missing data (can be any type)
# NaN: Not a number (specific to numeric)
# Inf: Infinity
str(missing_data)
 num [1:6] 1.1 0.2 NA 7 NaN ...

Matrix

Matrix


\(\geq\) 1 values of the same type with two dimensions

## A vector only has one dimension (length)
c(1, 2, 3, 4)
[1] 1 2 3 4
## A matrix has two dimensions (number rows and number columns)
## Create a 2x2 dimensional matrix
my_matrix <- matrix(c(1, 2, 3, 4),
                    nrow = 2, ncol = 2)

## Check the structure...
str(my_matrix)
 num [1:2, 1:2] 1 2 3 4

Matrix


\(\geq\) 1 values of the same type with two dimensions

## A matrix will also coerce values!
my_matrix <- matrix(c("1", 2, 3, 4),
                    # Create a 2x2 matrix
                    nrow = 2, ncol = 2)

## Everything is character!
str(my_matrix)
 chr [1:2, 1:2] "1" "2" "3" "4"

Matrix


\(\geq\) 1 values of the same type with two dimensions

## Find the number of rows in my matrix
nrow(my_matrix)
[1] 2
## Find the number of columns in my matrix
ncol(my_matrix)
[1] 2

Matrix

Index a matrix with [].

Warning

Remember, now we have two dimensions. So we index with ROW then COLUMN.

## Find the value at row 1 and column 2
my_matrix[1, 2]
[1] "3"
## Find all values in row 1
my_matrix[1, ]
[1] "1" "3"
## Find all values in column 2
my_matrix[, 2]
[1] "3" "4"

List

List

Contains any number of items.

Each item can be a different type.

## Storing in a vector or matrix coerces everything to be the same
c(c(1, 2, 3), c("A", "B", "C"), c(3, 4, 5))
[1] "1" "2" "3" "A" "B" "C" "3" "4" "5"
## Use lists to store character and numeric data in one object
my_list <- list(c(1, 2, 3),
                c("A", "B", "C"),
                c(3, 4, 5))

my_list
[[1]]
[1] 1 2 3

[[2]]
[1] "A" "B" "C"

[[3]]
[1] 3 4 5

List

Contains any number of items.

Each item can be a different type.

## Check the structure
str(my_list)
List of 3
 $ : num [1:3] 1 2 3
 $ : chr [1:3] "A" "B" "C"
 $ : num [1:3] 3 4 5

List


A list has one dimension (length: the number of items in the list)

## Number of items in the list
length(my_list)
[1] 3

List

We can index a list with [] and [[]].

Warning

They have slightly different meanings!

## Use `[]` to create a smaller list
## Create a new list with item 1 and 3
my_list[c(1, 3)]
[[1]]
[1] 1 2 3

[[2]]
[1] 3 4 5
## Use `[[]]` to access the items inside the list
## Return list item 1
my_list[[1]]
[1] 1 2 3

List

When list-elements are named, they can be accessed using either [[]] or $.

## Create a named list
my_named_list <- list(first = c(1, 2, 3),
                      second = c("A", "B", "C"),
                      third = c(3, 4, 5))
str(my_named_list)
List of 3
 $ first : num [1:3] 1 2 3
 $ second: chr [1:3] "A" "B" "C"
 $ third : num [1:3] 3 4 5
## Use `[[]]` to access the items inside the list
my_named_list[["first"]] # same as my_named_list[[1]]
[1] 1 2 3
## Use `$` to access the items inside the list
my_named_list$first
[1] 1 2 3

List

Note

Many advanced functions will store their output as a list object.

Remember, you can use str() to understand them better.

# These objects have a nice print output
model <- lm(Sepal.Length ~ Sepal.Width, data = iris)
model

Call:
lm(formula = Sepal.Length ~ Sepal.Width, data = iris)

Coefficients:
(Intercept)  Sepal.Width  
     6.5262      -0.2234  

List

Note

Many advanced functions will store their output as a list object.

Remember, you can use str() to understand them better.

# Internally, they have a lot more information!
str(model)
List of 12
 $ coefficients : Named num [1:2] 6.526 -0.223
  ..- attr(*, "names")= chr [1:2] "(Intercept)" "Sepal.Width"
 $ residuals    : Named num [1:150] -0.644 -0.956 -1.111 -1.234 -0.722 ...
  ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
 $ effects      : Named num [1:150] -71.566 -1.188 -1.081 -1.187 -0.759 ...
  ..- attr(*, "names")= chr [1:150] "(Intercept)" "Sepal.Width" "" "" ...
 $ rank         : int 2
 $ fitted.values: Named num [1:150] 5.74 5.86 5.81 5.83 5.72 ...
  ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
 $ assign       : int [1:2] 0 1
 $ qr           :List of 5
  ..$ qr   : num [1:150, 1:2] -12.2474 0.0816 0.0816 0.0816 0.0816 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:150] "1" "2" "3" "4" ...
  .. .. ..$ : chr [1:2] "(Intercept)" "Sepal.Width"
  .. ..- attr(*, "assign")= int [1:2] 0 1
  ..$ qraux: num [1:2] 1.08 1.02
  ..$ pivot: int [1:2] 1 2
  ..$ tol  : num 1e-07
  ..$ rank : int 2
  ..- attr(*, "class")= chr "qr"
 $ df.residual  : int 148
 $ xlevels      : Named list()
 $ call         : language lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
 $ terms        :Classes 'terms', 'formula'  language Sepal.Length ~ Sepal.Width
  .. ..- attr(*, "variables")= language list(Sepal.Length, Sepal.Width)
  .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:2] "Sepal.Length" "Sepal.Width"
  .. .. .. ..$ : chr "Sepal.Width"
  .. ..- attr(*, "term.labels")= chr "Sepal.Width"
  .. ..- attr(*, "order")= int 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(Sepal.Length, Sepal.Width)
  .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. ..- attr(*, "names")= chr [1:2] "Sepal.Length" "Sepal.Width"
 $ model        :'data.frame':  150 obs. of  2 variables:
  ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
  ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
  ..- attr(*, "terms")=Classes 'terms', 'formula'  language Sepal.Length ~ Sepal.Width
  .. .. ..- attr(*, "variables")= language list(Sepal.Length, Sepal.Width)
  .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:2] "Sepal.Length" "Sepal.Width"
  .. .. .. .. ..$ : chr "Sepal.Width"
  .. .. ..- attr(*, "term.labels")= chr "Sepal.Width"
  .. .. ..- attr(*, "order")= int 1
  .. .. ..- attr(*, "intercept")= int 1
  .. .. ..- attr(*, "response")= int 1
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. .. ..- attr(*, "predvars")= language list(Sepal.Length, Sepal.Width)
  .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. .. ..- attr(*, "names")= chr [1:2] "Sepal.Length" "Sepal.Width"
 - attr(*, "class")= chr "lm"

Data frame

Data frame

A special type of list:

  • Each item in the list (i.e. column) is a vector (same type)
  • All items in the list are the same length
  • Each item has a unique name
## Create our own data frame!
my_df <- data.frame(col1 = c(1, 2, 3, 4),
                    col2 = c("A", "B", "C", "D"),
                    col3 = c(3, 4, 5, 6))

my_df
  col1 col2 col3
1    1    A    3
2    2    B    4
3    3    C    5
4    4    D    6

Data frame

A special type of list:

  • Each item in the list (i.e. column) is a vector (same type)
  • All items in the list are the same length
  • Each item has a unique name
## Check the structure
str(my_df)
'data.frame':   4 obs. of  3 variables:
 $ col1: num  1 2 3 4
 $ col2: chr  "A" "B" "C" "D"
 $ col3: num  3 4 5 6

Data frame


A data frame has two dimensions (number of rows and number of columns).

## Number of rows...
nrow(my_df)
[1] 4
## Number of columns...
ncol(my_df)
[1] 3

Data frame


We can index using [] (just like a matrix) or using column names

## Items in rows 1 and 2 from column 2
my_df[c(1, 2), 2]
[1] "A" "B"
## All items in column 2
my_df[, 2]
[1] "A" "B" "C" "D"
## All items in column named 'col2'
my_df$col2
[1] "A" "B" "C" "D"

Data frame


Usually we create a data frame by reading in a .csv file!

## Items in rows 1 and 2 from column 2
iris_df <- read.csv("iris.csv")

str(iris_df)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...

Data frame

Use functions head(), tail(), or summary() to investigate a large data frame.

## A summary of all the columns
summary(iris_df)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
   Species         
 Length:150        
 Class :character  
 Mode  :character  
                   
                   
                   

Data frame

Use functions head(), tail(), or summary() to investigate a large data frame.

## The first few rows of data...
head(iris_df)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Using RMarkdown

Using RMarkdown

RMarkdown (.Rmd) file is a great way to record and share your analyses!

Using RMarkdown

Include code and output in the same document.

```{r}
## Write your code inside these 'chunks'
c(1, 2, 3, 4)
```
[1] 1 2 3 4

Using RMarkdown

Include plots to make a report.

```{r}
plot(Sepal.Length ~ Sepal.Width, data = iris_df)
```

Using RMarkdown

Write plain text to keep notes

# Section header

## Section subheader

Some notes about my code **in bold**. Below I include my code chunk...

```{r}
1 + 1
```

Some more notes...

Using RMarkdown

‘knit’ your notes to create a report

# Section header

## Section subheader

Some notes about my code **in bold**. Below I include my code chunk...

```{r}
1 + 1
```

Some more notes...

Using RMarkdown

‘knit’ your notes to create a report

Using RMarkdown

Note

TEST YOUR KNOWLEDGE

Using RMarkdown

  • Create a new RMarkdown file in RStudio (File > New File > RMarkdown)

  • Create a new chunk of R code:

mean(c(1, 2, "3", 4, NA))
  • Knit the document to html.

  • Check the document. Does the code work properly? Can you use you work out why?

  • BONUS: Search for the RMarkdown Cheatsheet online and try adding some headers and bold text.

  • BONUS: Knit the document to PDF.